Elliphant: Improved Automatic Detection of Zero Subjects and Impersonal Constructions in Spanish

نویسندگان

  • Luz Rello
  • Ricardo A. Baeza-Yates
  • Ruslan Mitkov
چکیده

In pro-drop languages, the detection of explicit subjects, zero subjects and nonreferential impersonal constructions is crucial for anaphora and co-reference resolution. While the identification of explicit and zero subjects has attracted the attention of researchers in the past, the automatic identification of impersonal constructions in Spanish has not been addressed yet and this work is the first such study. In this paper we present a corpus to underpin research on the automatic detection of these linguistic phenomena in Spanish and a novel machine learning-based methodology for their computational treatment. This study also provides an analysis of the features, discusses performance across two different genres and offers error analysis. The evaluation results show that our system performs better in detecting explicit subjects than alternative systems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Elliphant: A Machine Learning Method for Identifying Subject Ellipsis and Impersonal Constructions in Spanish

This thesis presents Elliphant, a machine learning system for classifying Spanish subject ellipsis as either referential or non-referential. Linguistically motivated features are incorporated in a system which performs a ternary classification: verbs with explicit subjects, verbs with omitted but referential subjects (zero pronouns), and verbs with no subject (impersonal constructions). To the ...

متن کامل

A First Approach to the Automatic Detection of Zero Subjects and Impersonal Constructions in Portuguese

In this paper we present a first approximation to the automatic detection of zero subjects and impersonal constructions in Brazilian Portuguese. To the best of our knowledge, this is the first attempt of approaching such task using machine learning in Portuguese. We compiled a corpus containing more than 5,600 instances annotated with the classes to be identified: explicit subjects, zero subjec...

متن کامل

A Portuguese-Spanish Corpus Annotated for Subject Realization and Referentiality

This paper presents a comparable corpus of Portuguese and Spanish consisting of legal and health texts. We describe the annotation of zero subject, impersonal constructions and explicit subjects in the corpus. We annotated 12,492 examples using a scheme that distinguishes between different linguistic levels (phonology, syntax, semantics, etc.) and present a taxonomy of instances on which annota...

متن کامل

A machine learning method for identifying impersonal constructions and zero pronouns in Spanish∗ Un método de aprendizaje automático para la identificación de construcciones impersonales y pronombres cero en español

In this paper, we present a machine learning system for classifying subject ellipsis in Spanish as either referential or non-referential. To the best of our knowledge, this is the first attempt to automatically identify non-referential ellipsis in Spanish. An evaluation of our system against 6,827 finite verbs shows an accuracy of 87%.

متن کامل

Passives and impersonals

Passive and impersonal constructions have a strikingly different status in current theoretical and descriptive studies. All formal approaches recognize passive constructions and provide some means of relating their properties to those of corresponding actives. Any framework that did not would be considered fundamentally deficient or incomplete. Many descriptive grammars likewise apply a broad n...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012